Speeding up External Mergesort

نویسندگان

  • LuoQuan Zheng
  • Per-Åke Larson
چکیده

External mergesort is normally implemented so that each run is stored contiguously on disk and blocks of data are read exactly in the order they are needed during merging. We investigate two ideas for improving the performance of external mergesort: interleaved layout and a new reading strategy. Interleaved layout places blocks from diierent runs in consecutive disk addresses. This is done in the hope that interleaving will reduce seek overhead during merging. The new reading strategy precomputes the order in which data blocks are to be read according to where they are located on disk and when they are needed for merging. Extra buuer space makes it possible to read blocks in an order that reduces seek overhead, instead of reading them exactly in the order they are needed for merging. A detailed simulation model was used to compare the two layout strategies and three reading strategies. The eeects of using multiple work disks were also investigated. We found that, in most cases, interleaved layout does not improve performance, but that the new reading strategy consistently performs better than double buuering and forecasting.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Buffering and Read-Ahead Strategies for External Mergesort

The elapsed time for external mergesort is normally dominated by I/O time. This paper is focused on reducing I/O time during the merge phase. Three new buffering and readahead strategies are proposed, called equal buffering, extended forecasting and clustering. They exploit the fact that virtually all modern disks perform caching and sequential readahead. The latter two also collect information...

متن کامل

‘Runsort’—An Adaptive Mergesort for Prolog

This note describes a novel list-sorting method for Prolog which is stable, has O(n logn) worst-case behaviour and O(n) best-case behaviour. The algorithm is an adaptive variant of bottom-up mergesort using so-called long runs of preexisting order to improve efficiency; accordingly we have called it ‘runsort’. Runsort compares favourably with samsort, and a modification to samsort is suggested.

متن کامل

Limit theorems for mergesort

Central and local limit theorems (including large deviations) are established for the number of comparisons used by the standard top-down recursive mergesort under the uniform permutation model. The method of proof utilizes Dirichlet series, Mellin transforms and standard analytic methods in probability theory.

متن کامل

Improving Mergesort for Linked Lists

We present a highly tuned mergesort algorithm that improves the cost bounds when used to sort linked lists of elements. We provide empirical comparisons of our algorithm with other mergesort algorithms. The paper also illustrates the sort of techniques that allow to speed a divide-and-conquer algorithm.

متن کامل

A Meticulous Analysis of Mergesort Programs

Abst rac t . The efficiency of mergesort programs is analysed under a simple unit-cost model. In our analysis the time performance of the sorting programs includes the costs of key comparisons, element moves and address calculations. The goal is to establish the best possible time-bound relative to the model when sorting n integers. By the well-known information-theoretic argument n log 2 n O(n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Trans. Knowl. Data Eng.

دوره 8  شماره 

صفحات  -

تاریخ انتشار 1996